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Objectives 

Maps  are  an  extremely  important  part  of  any  military  operation  and  producing  timely  and 
accurate  maps  is  an  essential  planning  tool.  Ocean-floor  or  terrain  data  are  static  and  spatial, 
while  meteorological  or  visibility  data  are  dynamic  and  spatio-temporal.  Data  upon  which  maps 
are  based  can  be  simultaneously  massive  and  sparse,  and  they  are  noisy.  In  the  presence  of 
uncertainty  due  to  missingness  and  measurement-error  noise,  spatial  and  spatio-temporal 
statistical  analysis  of  massive  datasets  is  challenging.  The  massiveness  causes  problems  in 
computing  optimal  spatial  predictors,  such  as  kriging,  since  one  has  to  solve  (and  store)  systems 
of  equations  equal  to  the  size  of  the  data.  In  addition,  a  large  spatial  domain  is  often  associated 
with  non-stationary  behavior  over  that  domain.  The  objectives  are:  ( 1 )  construct  a  flexible 
family  of  non-stationary  covariance  functions  using  a  truncated  set  of  basis  functions,  fixed  in 
number;  (2)  develop  the  necessary  methodology  and  algorithms  for  covariance-parameter 
estimation;  (3)  derive  optimal  spatial  or  spatio-temporal  maps  that  account  for  uncertainties 
statistically;  and  (4)  incorporate  spatial  and  spatio-temporal  dependencies  into  the  analysis  of 
sensor-network  data. 

Impact/Applications 

The  US  Navy  has  great  need  for  statistical  processing  to  produce  current  maps  and  to  forecast 
spatial  fields  in  a  rapidly  changing  environment.  The  massive-dataset-mapping  technology 
called  Fixed  Rank  Kriging  (FRK),  has  now  been  published  along  with  several  applications.  The 
main  paper  for  spatial  prediction,  by  Cressie  and  Johannesson,  has  been  published  in  2008  in  the 
Journal  of  the  Royal  Statistical  Society ,  one  of  the  top  three  statistics  journals  in  the  world.  A 
paper  on  spatio-temporal  prediction  applied  to  remote-sensing  data  from  satellites,  by  Shi  and 
Cressie,  has  been  published  in  2007  in  Environmetrics.  The  statistical  models  of  dependence  are 
highly  flexible  and  the  computational  algorithm  is  extremely  fast,  providing  the  mapping 
community  with  the  means  of  making  complete  maps  of  both  the  surface  (kriging)  and  its 
uncertainty  (kriging  variance). 

We  have  developed  two  web  sites  at  The  Ohio  State  University: 
www.stat.osu.edu/~C2  considers  probability  and  statistics  in  Command  and  Control. 
www.stat.osu.edu/~sses/research  mds.html  shows  the  FRK  mapping  approach  that  fills  in 
missing  data  and  smoothes  out  noise.  The  application  given  there  is  to  global  mapping  of 
aerosol  optical  depth  data  obtained  from  the  M1SR  instrument  on  the  Terra  satellite. 

During  the  period  of  the  grant,  a  two-year  contract  was  signed  with  Oak  Ridge  National 
Laboratory  (ORNL)  to  incorporate  spatial  and  spatio-temporal  dependencies  into  the  analysis  of 
sensor-network  data.  This  allowed  a  postdoctoral  fellow  to  be  supported  jointly  by  this  ONR 
grant  and  by  the  ORNL  contract 
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Personnel 


Principal  Investigator: 
Postdoctoral  Fellow: 
Research  Assistants: 


Technical  Approach 


Noel  Cressie,  PhD 

Chungfeng  Huang,  PhD  (partial  ONR  support) 
Yonggang  Yao 

Hongfei  Li  (partial  ONR  support) 

Lei  Kang 


This  research  is  focused  on  mapping  from  data  observed  at  many  locations  distributed  through 
space.  The  spatial  dependence  is  captured  through  a  set  of  r  (not  necessarily  orthogonal)  basis 
functions. 


S(u)  =  (S,(u),...,  Sr(u))' ;  ueOd,  (1) 

where  r  is  fixed  and  the  setting  is  a  (/-dimensional  Euclidean  space.  For  any  r*r  positive-definite 
matrix  K,  the  covariance  function  between  two  elements  T(u)  and  T(v)  of  the  spatial  process 
{ T(s) :  s  e  D  c  Od}  is  assumed  to  be, 

C(u,v)  =  S(u)'KS(v) .  (2) 

Associated  with  the  covariance  function  (2)  is  a  spectral  representation  for  T(  ),  given  by: 

T(s)  =  S(s)'v  ;  s  e  D ,  (3) 

where  v  is  an  r-dimensional  random  vector  such  that  var(v)  =  K.  We  assume  that  we  have  n 
measurements  of  T(  )  that  provide  data  Z  =  (Z(S| ),..., Z(s„))'  at  locations  S|,...,s„,  where 

Z(s,)  =  T(s,)  +  e(s,)  ;  i=l,...,n,  (4) 

and  s(  )  is  a  mean  zero,  variance  a). ,  white-noise  measurement  process  independent  of  T(  ). 

By  using  the  flexible  model  (2)  and  Fixed  Rank  Kriging,  which  is  an  optimal-spatial-prediction 
methodology  (Cressie  and  Johannesson,  2008),  massive  datasets  can  be  processed  in  0{n)  flops, 
and  inversion  of  only  an  r*r  matrix  is  needed  for  each  spatial-prediction  location;  recall  that  r  is 
fixed. 

FRK  relies  fundamentally  on  inversion  of  the  n  *  n  covariance  matrix  of  the  data,  namely 

var(Z)  =  SKS’-nt'I, 
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where  S  is  the  «*/•  matrix  (S(si)  ...  S(s„))'  and  I  is  the  n*n  identity  matrix.  Under  normal 
circumstances,  inversion  of  an  n*n  covariance  matrix  takes  0(n3)  flops.  FRK  reduces  this  to 
0(n )  by  invoking  the  following  formula: 

(var(Z))-1  =(1/o-£2)I-(1/ct£4)S{K-1  +(l/crc2)S’S}-,S', 
where  all  matrix  inverses  are  of  r*r  matrices  and  r  is  fixed. 

The  temporal  component  can  be  incorporated  as  follows.  Let  {K(s;/) :  s  e  Dc  Od ,  t  =  1,2,...} 
denote  a  process  to  be  sensed;  suppose  that  data  are  obtained  at  locations  si,...,s„,  and  at 
successive  times  t  =  1,2,...  .  Using  the  same  terminology  as  for  the  spatial  case,  we  wish  to 
predict  f(  ;to)  based  on  data, 

Z(/)  =  (Z(si ;/),... ,Z(s„;0)';  /  =  l,...,/0  -  (5) 

Adding  the  time  component  also  allows  us  to  pose  the  problem  of forecasting  T(-;fo  +  1).  Key  to 
obtaining  solutions  to  the  prediction  and  forecasting  problems  is  the  assumption  of  a  spatio- 
temporal  model. 

We  generalize  (3)  by  assuming  that 

Y(s-t)  =  S(s)'v(0  ;  s  e  D ,  t=  1,2,...,  (6) 

where  {v(/)  :  t  =  1,2,...}  is  a  temporal  stochastic  process.  For  example,  if  v(l),v(2),...  are 
independent  and  identically  distributed,  the  data  Z(l),Z(2),...,Z(to)  could  be  viewed  as 
independent  realizations  of  the  spatial  function  Y()  given  by  (3).  Another  example  that  we  find 
very  interesting  is  that  of  early  detection  of  bioterrorism  events,  where  we  assume  a  dynamic 
model  for  t>(-).  Wikle  and  Cressie  (1999)  did  this  in  a  climate  context,  and  their  basis  functions 
were  empirical  orthogonal  functions.  The  model  (6)  is  more  general  and  offers  the  oportunity  of 
early  detection  by  continually  testing  for  a  regime  shift  in  the  dynamic  model  for  t>(  ). 

References 

Cressie,  N.  and  Johannesson,  G.  (2008).  Fixed  rank  kriging  for  very  large  spatial  data  sets. 
Journal  of  the  Royal  Statistical  Society,  Series  B,  70,  1-18. 

Wikle,  C.  K.  and  Cressie,  N.  (1999).  A  dimension-reduced  approach  to  space-time  Kalman 
filtering.  Biometrika,  86,  815-829. 
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Results 

Listed  below  are  the  presentations  and  publications  related  to  the  grant. 
Presentations 


2005 

Invited  seminar  speaker.  Department  of  Statistics,  North  Carolina  State  University,  Raleigh, 
NC;  “A  fast,  optimal  spatial  prediction  method  for  massive  datasets”,  January  2005. 

Invited  seminar  speaker.  Mathematics  Laboratory,  Universite  de  Paris  Sud,  Orsay,  France; 
“A  fast,  optimal  spatial-prediction  method  for  massive  datasets”,  March  2005. 

Invited  seminar  speaker,  Division  of  Mathematical  and  Information  Sciences,  CSIRO,  Perth, 
Australia;  “Geostatistical  prediction  of  spatial  extremes  and  their  extent”,  April  2005. 

Invited  seminar  speaker.  Department  of  Statistics,  University  of  Padova,  Padova,  Italy; 
“Dynamic  multi-resolution  spatial  models”,  June  2005. 

Presented  an  invited  paper  at  SAMSI  Workshop  on  Bridging  Statistical  Approaches  and 
Sequential  Data  Assimiliation,  Research  Triangle  Park,  NC;  “Data  assimilation  using  multi¬ 
resolution  spatio-temporal  models”,  June  2005. 

Co-authored  a  contributed  poster  (with  J.  Zhang  and  P.  Craigmile)  at  Joint  Statistical 
Meetings,  Minneapolis,  MN;  “Predicting  exceedance  regions  for  geostatistical  processes”, 
August,  2005. 


2006 

Presented  a  keynote  address  to  2006  White  Conference  on  Mastering  the  Data  Explosion  in 
the  Earth  and  Environmental  Sciences,  Australian  Academy  of  Sciences,  Canberra, 

Australia;  “Spatial  prediction  for  massive  datasets”,  April  2006. 

Dan  and  Carol  Burack  President's  Distinguished  Lecturer,  University  of  Vermont, 

Burlington,  VT.  Presented  Burack  lecture;  “Massive  but  sparse  spatial  data”.  May  2006. 

Presented  an  invited  paper  (with  T.  Shi)  at  Second  NASA  Data  Mining  Workshop:  Issues 
and  Applications  in  Earth  Sciences,  Pasadena,  CA;  “Satellite  data:  Massive  but  sparse”.  May 
2006. 
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Presentations,  ctd. 

Presented  an  invited  paper  (with  G.  Johannesson)  at  American  Statistical  Association  Annual 
Meeting,  Seattle,  WA;  “Fixed  rank  kriging  for  massive  datasets”.  Also  co-authored  a 
contributed  paper  (with  C.  Huang,  Y.  Yao,  and  T.  Hsing);  “Cokriging  with  generalized  cross¬ 
covariances  for  detecting  radioactivity”.  Also  coauthored  a  contributed  paper  (with  Y.  Yao); 
“Spatial  multivariate  EOFs:  Discrete  to  continuous  approximations”.  Also  co-authored  a 
contributed  paper  (with  J.  Zhang  and  P.  Craigmile);  “Predicting  spatial  exceedance  regions”. 
Also  co-authored  a  contributed  paper  (with  H.  Li  and  C.  Calder);  “Testing  for  spatial 
dependence  based  on  the  SAR  model”,  August  2006. 

Presented  the  keynote  address  at  METMA3,  International  Workshop  on  Spatio-Temporal 
Modelling,  Pamplona,  Spain;  “Spatio-temporal  satellite  data  processing”,  September  2006. 

Presented  an  invited  paper  at  the  International  Symposium  on  Statistical  Analysis  of  Spatio- 
Temporal  Data,  Tokyo,  Japan;  “Spatio-temporal  satellite  data  processing”,  November  2006. 

2007 

Invited  seminar  speaker,  Institute  of  Statistics  and  Decision  Sciences,  Duke  University, 
Durham,  NC;  “Optimal  spatial  prediction  for  large  spatial  datasets”,  February  2007. 

Invited  seminar  speaker,  SAMOS,  Universite  de  Paris  1  (Sorbonne),  France;  “Spatial 
prediction  for  massive  datasets”,  March  2007. 

Presented  five  lectures  as  Principal  Lecturer,  32nd  Spring  Lecture  Series  on  Spatial  and 
Spatio-Temporal  Statistics,  University  of  Arkansas,  Fayetteville,  AR,  April  2007. 

Presented  an  invited  paper  at  Workshop  Spatial  Statistics,  Universite  de  Paris  1  (Sorbonne), 
France;  “Predicting  spatial  exceedance  regions”,  April  2007. 

Co-authored  a  contributed  paper  (with  C.  Huang  and  T.  Hsing  at  American  Statistical 
Association  Annual  Meeting,  Salt  Lake  City,  UT);  “Spectrum  estimation  for  isotropic 
intrinsically  stationary  spatial  processes”.  Also  co-authored  a  contributed  paper  (with  H.  Li 
and  C.  Calder);  “Exploratory  spatial  data  analysis  using  APLE  statistics”.  Also  co-authored 
a  contributed  paper  (with  L.  Kang  and  D.  Liu);  “Spatial  statistical  analysis  of  doctors' 
prescription  amounts  by  region”.  Also  co-authored  a  contributed  paper  (with  T.  Shi); 
“Spatio-temporal  processing  of  MISR's  aerosol  optical  depth  data”,  July  2007. 

Presented  an  invited  paper  (with  A.  Braverman  and  H.  Nguyen)  at  2007  American 
Geophysical  Union  Fall  Meeting,  San  Francisco,  CA;  “Fusing  measurements  statistically: 
Combining  aerosol  data  from  MISR  and  MODIS”,  December  2007. 
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Publications:  Refereed  Articles 

Craigmile,  P.  F.,  Cressie,  N.,  Santner,  T.  J.,  and  Rao,  Y.  (2006).  Bayesian  inference  on 
environmental  exceedances  and  their  spatial  locations.  Extremes ,  8,  143-159. 

Cressie,  N.  (2006).  Block  kriging  for  lognormal  spatial  processes.  Mathematical 
Geology,  38,  413-443. 

Cressie,  N.  and  Verzelen,  N.  (2007).  Conditional-mean  least-squares  fitting  of  Gaussian 
Markov  random  fields  to  Gaussian  fields.  Computational  Statistics  and  Data  Analysis, 

52,  2794-2807. 

Li,  H.,  Calder,  C.  A.,  and  Cressie,  N.  (2007).  Beyond  Moran's  I:  Testing  for  spatial 
dependence  based  on  the  SAR  model.  Geographical  Analysis,  39,  357-375. 

Sain,  S.  and  Cressie,  N.  (2007).  A  spatial  model  for  multivariate  lattice  data.  Journal 
of  Econometrics, \ AD,  226-259. 

Shi,  T.  and  Cressie,  N.  (2007).  Global  statistical  analysis  of  MISR  aerosol  data:  A 
massive  data  product  from  NASA's  Terra  satellite.  Environmetrics,  19,  665-680. 

Cressie,  N.  and  Johannesson,  G.  (2008).  Fixed  rank  kriging  for  very  large  spatial 
datasets.  Journal  of  the  Royal  Statistical  Society,  Series  B,  70,  1-18. 

Cressie,  N.  and  Kapat,  P.  (2008).  Some  diagnostics  for  Markov  random  fields.  Journal  of 
Computational  and  Graphical  Statistics,  forthcoming. 

Zhang,  J.,  Craigmile,  P.F.,  and  Cressie,  N.  (2008).  Loss  function  approaches  to  predict  a 
spatial  quantile  and  its  exceedance  region.  Technometrics,  forthcoming. 

Publications:  Non-refereed  articles 

Cressie,  N.  and  Yao,  Y.  (2005).  Release  of  Web-Project:  TCO,  showing  spatial  prediction 
of  total  column  ozone  over  the  globe  using  a  fast  multi-resolution  spatial  statistical  model 
fhttp://ww.stat. osu.edu/~sses/collab  ozone.php). 

Ganguly,  A.R.,  Hsing,  T.,  Katz,  R.,  Erickson,  D.J.,  Ostrouchov,  G.,  Wilbanks,  T.J.,  and 
Cressie,  N.  (2005).  Multivariate  dependence  among  extremes,  abrupt  change,  and 
anomalies  in  space  and  time  for  climate  applications,  in  Proceedings  of  the  International 
Workshop  on  Data  Mining  Methods  for  Anomaly  Detection,  eds  D.  Margineantu,  S.  Bay, 

P.  Chan,  and  T.  Lane,  25-26. 
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Publications:  Non-refereed  articles,  ctd. 

Cressie,  N.  and  Johannesson,  G.  (2006).  Spatial  prediction  for  massive  datasets,  in 
Mastering  the  Data  Explosion  in  the  Earth  and  Environmental  Sciences:  Proceedings  of  the 
Australian  Academy  of  Science  Elizabeth  and  Frederick  White  Conference.  Australian 
Academy  of  Science,  Canberra,  Australia  (1 1  pp.) 

Calder,  C.  and  Cressie,  N.  (2007).  Some  topics  in  convolution-based  spatial  modeling,  in 
Proceedings  of  the  56th  Session  of  the  International  Statistical  Institute,  Lisbon,  Portugal, 
forthcoming. 

Sain,  S.R.,  Furrer,  R.,  and  Cressie,  N.  (2007).  Combining  regional  climate  model  output  via 
a  multivariate  Markov  random  field  model,  in  Proceedings  of  the  56th  Session  of  the 
International  Statistical  Institute,  Lisbon,  Portugal,  forthcoming. 

Articles  submitted/ in  preparation 

Huang,  C.,  Cressie,  N.,  Yao,  Y.,  and  Hsing,  T.  (2007).  Multivariate  intrinsic  random 
functions  for  cokriging,  under  revision  for  Mathematical  Geosciences. 

Huang,  C.,  Hsing,  T.,  and  Cressie,  N.  (2007).  On  a  general-spline  estimator  for  the 
spectral  density  function,  under  journal  review. 

Huang,  C.,  Hsing,  T.,  Cressie,  N.,  Ganguly,  A.R.,  Protopopescu,  V.A.,  and  Rao,  N.S. 

(2007).  Statistical  analysis  of  plume  model  identification  based  on  sensor  network 
measurements,  under  revision  for  Transactions  on  Sensor  Networks. 

Kang,  L.,  Liu,  D.,  and  Cressie,  N.  (2007).  Statistical  analysis  of  small-area  data  based  on 
independence,  spatial,  non-hierarchical,  and  hierarchical  models,  under  journal  review. 
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